Association between Taxonomic Composition of Gut Microbiota and Host Single Nucleotide Polymorphisms in Crohn’s Disease Patients from Russia

Crohn’s disease (CD) is a chronic relapsing inflammatory bowel disease of unknown etiology. Genetic predisposition and dysbiotic gut microbiota are important factors in the pathogenesis of CD. In this study, we analyzed the taxonomic composition of the gut microbiota and genotypes of 24 single nucleotide polymorphisms (SNP) associated with the risk of CD. The studied cohorts included 96 CD patients and 24 healthy volunteers from Russia. Statistically significant differences were found in the allele frequencies for 8 SNPs and taxonomic composition of the gut microbiota in CD patients compared with controls. In addition, two types of gut microbiota communities were identified in CD patients. The main distinguishing driver of bacterial families for the first community type are Bacteroidaceae and unclassified members of the Clostridiales order, and the second type is characterized by increased abundance of Streptococcaceae and Enterobacteriaceae. Differences in the allele frequencies of the rs9858542 (BSN), rs3816769 (STAT3), and rs1793004 (NELL1) were also found between groups of CD patients with different types of microbiota communities. These findings confirm the complex multifactorial nature of CD.


Introduction
Crohn's disease (CD) is a chronic relapsing disease characterized by inflammation of various regions of the gastrointestinal tract, mainly the small and large intestines. While the etiology of this disease is still unclear, it is known to be multifactorial. The pathogenesis of CD depends on environmental factors, genetic predisposition, individual immune response, and intestinal microbiota.
Through the development of DNA sequencing technology in recent decades, there is a significant amount of data on the intestinal microbiota in health and disease. It is known that inflammatory bowel diseases (IBD) mainly affect the regions of the gastrointestinal tract with the maximum density of the bacterial population (colon and small intestine). Many studies confirm the association of microbiota composition with IBD [1][2][3][4][5], including in the Russian population [6][7][8][9]. The microbiota of IBD patients is most often characterized by reduced alpha diversity, and a decrease in abundances of Firmicutes and Bacteroidetes, and an increase in Proteobacteria and E. coli, in particular. At the functional level, these Crohn's disease activity index (CDAI) Mildly active (150-220 points)-68.75% Moderately active (221-450 points)-25% Severely active (>451 points)-6.25% Table 1. Cont.

Gut Microbiota of CD Patients and Healthy Volunteers
The number of sequencing read pairs obtained from fecal samples of CD patients and healthy controls ranged from 53,175 to 182,362 (median 91,061). Raw reads were deposited in the NCBI SRA under accession number PRJNA938107 in the fastq format. After merging, quality control, removing of chimeric reads, and rarefying, 20,938 reads per sample remained.

Gut Microbiota of CD Patients and Healthy Volunteers
The number of sequencing read pairs obtained from fecal samples of CD patients and healthy controls ranged from 53,175 to 182,362 (median 91,061). Raw reads were deposited in the NCBI SRA under accession number PRJNA938107 in the fastq format. After merging, quality control, removing of chimeric reads,v and rarefying, 20,938 reads per sample remained.
Shannon's alpha diversity index and observed operational taxonomic units (OTUs) were significantly reduced in CD patients compared with controls ( Figure 1A). A decreased abundance of the families Clostidiaceae, Coriobacteriaceae, and Rikenellaceae and an increased representation of Lactobacillaceae, Enterococcaceae, Streptococcaceae, and Enterobacteriaceae were also found ( Figure 1B).  Depending on the CDAI, differences in the taxonomic composition of CD patients' microbiota are revealed. The Firmicutes phylum and Micrococcaceae and Enterococcaceae families showed significant positive correlations with CD activity (Figure 2A). Significant negative correlations with the severity of the disease were found for the Bacteroidetes phylum and Eryspelotrichaceae, [Odoribacteraceae], Rikenellaceae, Coriobacteriaceae, Bacteroidaceae, and Porphyromonadaceae families (Figure 2A). When CD patients were divided into three groups according to the activity of the disease, significant differences in the representation of three families were revealed-Micrococcaceae increased with the increase in CD severity, while the abundance of Coriobacteriaceae and Bacteroidaceae decreased ( Figure 2B). Depending on the CDAI, differences in the taxonomic composition of CD patients' microbiota are revealed. The Firmicutes phylum and Micrococcaceae and Enterococcaceae families showed significant positive correlations with CD activity (Figure 2A). Significant negative correlations with the severity of the disease were found for the Bacteroidetes phylum and Eryspelotrichaceae, [Odoribacteraceae], Rikenellaceae, Coriobacteriaceae, Bacteroidaceae, and Porphyromonadaceae families (Figure 2A). When CD patients were divided into three groups according to the activity of the disease, significant differences in the representation of three families were revealed-Micrococcaceae increased with the increase in CD severity, while the abundance of Coriobacteriaceae and Bacteroidaceae decreased ( Figure 2B). Based on the Dirichlet multinomial mixtures method, two types of microbiota can be distinguished according to the taxonomic composition ( Figure 3). The first community type (I) included 61 CD patients and all 24 controls, while the second group (II) included 35 CD patients. Thus, the frequency of occurrence of microbiota types in CD patients and controls is significantly different (p = 0.0003, Exact Fisher test). The main driver representatives (the most abundant in these communities) of the first community type are the families Lachnospiraceae, Ruminococcaceae, and Bacteroidaceae, and an unclassified member of the order Clostridiales ( Figure 4A), while the second type is determined by Lachnospiraceae, Streptococcaceae, Ruminococcaceae, and Enterobacteriaceae ( Figure 4B). Based on the Dirichlet multinomial mixtures method, two types of microbiota can be distinguished according to the taxonomic composition ( Figure 3). The first community type (I) included 61 CD patients and all 24 controls, while the second group (II) included 35 CD patients. Thus, the frequency of occurrence of microbiota types in CD patients and controls is significantly different (p = 0.0003, Exact Fisher test). The main driver representatives (the most abundant in these communities) of the first community type are the families Lachnospiraceae, Ruminococcaceae, and Bacteroidaceae, and an unclassified member of the order Clostridiales ( Figure 4A), while the second type is determined by Lachnospiraceae, Streptococcaceae, Ruminococcaceae, and Enterobacteriaceae ( Figure 4B).

Analysis of Microbiota Community Types in CD Patients
When comparing the two types of communities identified in CD patients, the second type showed a significant decrease in the number of observed OTUs and Shannon's alpha diversity index, indicating a more prominent dysbiosis ( Figure 5A, Table S1). A decrease in the abundance of the Bacteroidetes phylum and an increase in the Proteobacteria, Fusobacteria, and Verrucomicrobia phyla were also observed. Moreover, the abundance of the Bacteroidaceae, Prevotellaceae, Lachnospiraceae, Ruminococcaceae, and Erysipelotrichaceae families and unclassified Clostridiales were significantly declined in the second type of community of CD patients ( Figure 5B, Table S1). These bacteria are members of the normal human microbiota and play a role in maintaining intestinal homeostasis. An increased amount of the Verrucomicrobiaceae, Enterococcaceae, Streptococcaceae, and Enterobacteriaceae families was also found ( Figure 5B, Table S1). Thus, the microbiome of CD patients with community type II is characterized by prominent dysbiosis, while the microbiome of patients with the first type is more similar to the healthy ones.

Analysis of Microbiota Community Types in CD Patients
When comparing the two types of communities identified in CD patients, the second type showed a significant decrease in the number of observed OTUs and Shannon's alpha diversity index, indicating a more prominent dysbiosis ( Figure 5A, Table S1). A decrease in the abundance of the Bacteroidetes phylum and an increase in the Proteobacteria, Fusobacteria, and Verrucomicrobia phyla were also observed. Moreover, the abundance of the Bacteroidaceae, Prevotellaceae, Lachnospiraceae, Ruminococcaceae, and Erysipelotrichaceae families

Analysis of Microbiota Community Types in CD Patients
When comparing the two types of communities identified in CD patients, the second type showed a significant decrease in the number of observed OTUs and Shannon's alpha diversity index, indicating a more prominent dysbiosis ( Figure 5A, Table S1). A decrease in the abundance of the Bacteroidetes phylum and an increase in the Proteobacteria, Fusobacteria, and Verrucomicrobia phyla were also observed. Moreover, the abundance of the Bacteroidaceae, Prevotellaceae, Lachnospiraceae, Ruminococcaceae, and Erysipelotrichaceae families

Analysis of Clinical Parameters in CD Patients with Different Types of Microbial Communities
When comparing CD patients with different types of microbial communities, no significant differences were found in clinical characteristics-duration of disease, location of inflammation (ileitis, colitis, ileocolitis), disease activity (based on the Crohn's disease activity index), phenotype of disease (inflammatory, stricturing, fistulizing), or stool frequency ( Table 2). and unclassified Clostridiales were significantly declined in the second type of community of CD patients ( Figure 5B, Table S1). These bacteria are members of the normal human microbiota and play a role in maintaining intestinal homeostasis. An increased amount of the Verrucomicrobiaceae, Enterococcaceae, Streptococcaceae, and Enterobacteriaceae families was also found ( Figure 5B, Table S1). Thus, the microbiome of CD patients with community type II is characterized by prominent dysbiosis, while the microbiome of patients with the first type is more similar to the healthy ones.

Analysis of Clinical Parameters in CD Patients with Different Types of Microbial Communities
When comparing CD patients with different types of microbial communities, no significant differences were found in clinical characteristics-duration of disease, location of inflammation (ileitis, colitis, ileocolitis), disease activity (based on the Crohn's disease activity index), phenotype of disease (inflammatory, stricturing, fistulizing), or stool frequency (Table 2).

SNP Analysis in CD Patients and Healthy Volunteers
All 24 genetic markers agreed to Hardy-Weinberg equilibrium proportions in the control population (p > 0.05). Allele frequencies of 8 genetic polymorphisms were significantly different between the CD groups and healthy subjects ( Table 3). The alleles rs1004819A and rs11209026G of the IL23R gene, as well as rs2241880A (ATG16L1), rs4958847A (IRGM), rs1992662G (PTGER4), rs2274910C (ITLN1), rs6601764T, and rs7807258C were found to be more frequent in patients with CD.  In the group of CD patients with the second type of gut microbiota community, the following allele frequencies: A in rs9858542 of the BSN gene, T in rs3816769 of the STAT3 gene, and C in rs1793004 of the NELL1 gene were significantly increased ( Table 4). All of these alleles are associated with an increased risk of CD [27][28][29][30][31][32][33].

Correlation between SNP and Taxonomic Composition of Gut Microbiota in CD Patients
Statistically significant negative correlations of rs9858542 (BSN) with the number of observed OTUs and the representation of the Bacteroidetes phylum were revealed using an additive model ( Figure 6). Rs3816769 (STAT3) showed a negative correlation with the phylum Bacteroidetes and especially with the family Bacteroidaceae. For rs1793004 (NELL1) a negative correlation was found with the family Ruminococcaceae and a positive correlation with the family Enterococcaceae. Furthermore, significant negative correlations were found between abundance of Bacteroidaceae with rs2274910 (ITLN1), rs2522057 (IRF1-AS1), rs224136 (intergenic), rs6908425 (CDKAL1), and rs12037606 (intergenic) and a positive correlations with rs1992662 (PTGER4), rs1456893 (intergenic), and 13361189 (IRGM). Enterococcaceae and Enterobacteriaceae families showed significant positive correlations with rs224136 (intergenic).

Patients
Statistically significant negative correlations of rs9858542 (BSN) with the number of observed OTUs and the representation of the Bacteroidetes phylum were revealed using an additive model ( Figure 6). Rs3816769 (STAT3) showed a negative correlation with the phylum Bacteroidetes and especially with the family Bacteroidaceae. For rs1793004 (NELL1) a negative correlation was found with the family Ruminococcaceae and a positive correlation with the family Enterococcaceae. Furthermore, significant negative correlations were found between abundance of Bacteroidaceae with rs2274910 (ITLN1), rs2522057 (IRF1-AS1), rs224136 (intergenic), rs6908425 (CDKAL1), and rs12037606 (intergenic) and a positive correlations with rs1992662 (PTGER4), rs1456893 (intergenic), and 13361189 (IRGM). Enterococcaceae and Enterobacteriaceae families showed significant positive correlations with rs224136 (intergenic).

Discussion
Changes in the gut microbiota composition and role of genetics in CD patients have been described in a number of studies. However, there is limited data on CD patients in the Russian population. CD prevalence in Russia is estimated to be 3.0-7.88 cases per 100,000 population [34,35], and it rises 8-10% annually [35], but it is still substantially lower than in Western Europe and North America [36]. Patients in our study were recruited from two regions of Russia (the Republic of Tatarstan and Moscow), ensuring that people of different nationalities (mainly Russians and Tatars) were represented.
Our results indicate a decrease in the diversity of the gut microbiota in CD patients compared to healthy volunteers, which has also been found in many other studies [37][38][39]. Changes in the abundance of the families Bacteroidaceae, Prevotellaceae, Clostridiaceae, Lachnospiraceae, Ruminococcaceae, Eryspelotrichaceae, Enterobacteriaceae, Fusobacteriaceae, Lactobacillaceae, Enterococcaceae, and Streptococcaceae are often detected. The families Bacteroidaceae, Prevotellaceae, and Rikenellaceae are members of the phylum Bacteroidetes and perform several important functions in the gut, including metabolizing proteins and carbohydrates [40], producing butyrate [41], and preventing the colonization of the gastrointestinal tract by pathogenic bacteria [42]. In our study, among the most abundant phylum Bacteroidetes in CD patients, only the Rikenellaceae family decreased significantly compared to controls. The functions of this family in the gut microbiota have not yet been studied, but there is evidence of its decrease in patients with IBD and an increase in patients with irritable bowel syndrome [43]. Among the representatives of the phylum Firmicutes, there was a decrease in the proportion of the order Clostridiales and, in particular, of the family Clostridiaceae. They are known to be SCFAs producers and are involved in the metabolism of bile acids. There is a number of conflicting data on this taxon. While some authors observe an increase of Clostidiaceae in healthy controls and a decrease in CD patients [44][45][46][47], the others found an increase of this taxon in IBD patients [48,49]. In our study, we found a decreased abundance of the Coriobacteriaceae family of the Actinobacteria phylum in CD patients, which is consistent with previous studies [50][51][52]. Coriobacteriaceae have important functions in the gut including the conversion of bile salts and steroids and the activation of dietary polyphenols [53].
An increase of the Enterobacteriaceae family members was also found in the microbiota of CD patients. This is consistent with previously reported data in which the increased representation of this family was a marker of dysbiosis in IBD [8,54]. However, no association of any E. coli virulence genes with CD was found in the Russian population [55]. In our study, we found an increase in the proportion of lactic acid producing bacteria from the Lactobacillaceae, Enterococcaceae, and Streptococcaceae families in patients with CD, which is consistent with previous studies [9,[56][57][58][59][60]. These bacteria are commensals; however, they can sometimes cause inflammation of various tissues in the respiratory, cardiovascular, and nervous systems [61][62][63][64]. Streptococci are known to provoke intestinal inflammation by inducing a pro-inflammatory response to lipoproteins and other components, as well as to the interaction of subtilisin-like protease (SspA) with Toll-like receptor 2 (TLR2) [65]. The role of enterococci in the pathogenesis of IBD has been described in a study showing that Enterococcus faecalis can cause IBD in the IL-10 knockout mouse model [66]. A pathogenicity island encoding surface aggregating protein (asa1), gelatinase (gelE), cytolysin (cylA), extracellular surface protein (esp), and hyaluronidase (hyl) was also identified as a possible trigger of the host inflammatory response [67]. Whether lactobacilli can provoke IBD or are simply adapted to survive in an inflamed gut is still an open question.
Many studies attempted to identify bacterial taxa that change with IBD activity/severity. Many of them agree that Faecalibacterium prausnitzii is associated with minimal inflammation [68][69][70]. However, the results for other taxa are conflicting. We found an increase in the representation of the Enterococcaceae and Micrococcaceae families in the gut microbiota of patients with more severe CD, which is consistent with the results of other studies [71][72][73]. In addition, we found a decrease in the abundance of the Eryspelotrichaceae and Coriobacteriaceae families with higher disease activity. A similar trend was observed for the Eryspelotrichaceae, while the opposite was found for Coriobacteriaceae by Papa et al. [74]. In our study, similar to Tedjo et al. Bacteroidaceae were increased in patients with mild CD [75], whereas other authors found the opposite [46,75,76]. Therefore, there is no clear understanding of the microbiota composition of IBD patients according to disease severity.
According to our data, the microbiota of CD patients is heterogeneous and two types of communities that can be identified. Thus, patients with a type I microbiota community shared it with control samples. Patients with a type II microbiota community are characterized by a lower diversity of the microbiota and a lower number of observed OTUs compared with the type I, indicating a more severe dysbiosis. In a study by Vieira-Silva et al., a similar method revealed four enterotypes, whose drivers were Ruminococcaceae, Prevotella, and Bacteroides [70]. The microbiota enterotypes of the Japanese, European, and American populations are characterized by the same taxa [77]. Other enterotypes were identified in a model organism study by Barron et al. where the main driver taxa were Lachnospiraceae and Ruminoccoacceae, Enterobacteriaceae and Lactobacillus, Erysipelotrichaceae and Akkermansia [78]. In our study, a number of bacterial families were represented differently in the microbiota community types. Thus, study participants with community type II had an increased abundance of Enterobacteriaceae, Enterococcaceae, and Streptococcaceae families, which, as noted above, are typical characteristics of CD patients' gut microbiota. In addition, the abundance of Verrucomicrobiaceae, whose role in the pathogenesis of IBD is actively debated, was increased. Some authors noted a decrease of its representation in IBD and even suggested the use of Akkermansia muciniphila as a new generation probiotics [9,[78][79][80], while others showed its increase in the microbiota of CD patients and suggested that it degrades the mucin of the intestinal mucosa, thereby provoking its inflammation [8]. There was also a decrease in the abundance of Bacteroidaceae, Prevotellaceae, Lachnospiraceae, Ruminococcaceae, Erysipelotrichaceae, and unclassified Clostridiales in the CD patients' microbiota community of type II. These bacteria belong to the normal microbiota and are important in keeping the gut healthy. Thus, the second type of microbial community is characterized with more prominent dysbiotic changes in the microbiota of CD patients.
We found no differences in the clinical characteristics of CD (duration of disease, location of inflammation, disease activity, and stool frequency) between the two groups of patients with different types of gut microbiota communities, suggesting the presence of other reasons for this distribution.
As CD is a multifactorial disease, genetic factors may be responsible for differences in the gut microbiota composition. There are 24 single nucleotide polymorphisms studied, which have previously been associated with CD in various populations. Compared with controls, patients with CD have a significantly higher allele frequency of 8 SNPs. For the remaining 16 SNPs, no significant differences were found, probably due to the regional characteristics of the Russian population o the limited size of the cohort. It is known that the representation of some bacterial taxa in the intestinal microbiota is associated with specific alleles of host SNP. Therefore, polymorphisms in the LCT gene determine the percentage of Bifidobacterium in the gut microbiota of healthy individuals [81], which can be explained by bacterial enzymes compensating for lactase deficiency. There are also data on the relationship of representatives of Akkermansia, Anaerostipes, Clostridiaceae, Blautia, Dialister, Bacteroides, Atopobium, etc. with various host genetic loci, but the mechanism of these relationships has not been studied [81][82][83]. In the case of IBD, a high abundance of Enterobacteriaceae was found in the microbiota of NOD2-deficient patients [84]. Certain polymorphisms in the FUT2 gene were associated with decreased SCFAs-producing Faecalibacterium and increased Proteobacteria [85]. It is also known that the ATG16L1 T300A variant is associated with increased abundance of the Bacteroides genus [17]. In our study, rs9858542A allele in the BSN gene was found to be more frequent in CD patients with a second dysbiotic type of microbiota community and negatively correlated with the number of observed OTUs and Bacteroidetes phylum representation. The rs9858542A allele is known to be associated with an increased risk of CD [27][28][29]. The BSN gene encodes Bassoon Presynaptic Cytomatrix Protein, which is involved in organizing the presynaptic cytoskeleton and expressed primarily in brain neurons, although there is an evidence that this protein is also expressed at low levels in enteroendocrine cells in the gastrointestinal tract, including the stomach, duodenum, colon, and rectum [86]. These cells produce gut hormones that control digesting and food absorbtion, insulin secretion, etc. [87]. It is also known that the gut microbiota produce several metabolites (SCFAs, secondary bile acids, indoles, and lipopolysaccharides) that stimulate enteroendocrine cells [88][89][90][91][92]. The mechanism of BSN gene product interaction with intestinal microbiota is still unknown, but probably involves the interplay of microbiota metabolites with host enteroendocrine cells. In our study, we also found that the T allele in rs3816769 of the STAT3 gene is significantly more frequent in CD patients with a second dysbiotic type of intestinal microbiota and negatively correlates with the Bacteroidetes phylum and Bacteroidaceae family in particular. This variant is also known to be associated with CD risk [30,93]. The transcription factor STAT3 (signal transducer and activator of transcription 3) regulates apoptosis, cell growth and inflammation in response to internal and external stimuli. In animal models, STAT3 activation in intestinal epithelial cells is required for wound healing, but also leads to the development of colitis-associated cancer in chronic inflammation [92,94]. Additionally, STAT3-deficient mice have increased sensitivity to bacterial lipopolysaccharide and increased levels of proinflammatory cytokines, and are more prone to chronic enterocolitis [95]. Zhao et al. found that microbial SCFAs activate STAT3 in intestinal epithelial cells, while STAT3 knockout resulted in a decrease in SCFA-induced antimicrobial peptide production [96]. Therefore, the STAT3 gene mutation rs3816769T may affect the host-microbiota interaction. The C allele of rs1793004 in the NELL1 gene was significantly more frequent in CD patients with the second dysbiotic microbiota type. Furthermore, a negative correlation of this variant with the Ruminococcaceae family and a positive correlation with Enterococcaceae were found. NELL1 encodes neural epidermal growth factor-like 1, which is expressed at significant levels in epithelial cells of the small and large intestine, including inflamed epithelium [97]. The association of rs1793004C with IBD has been demonstrated by a genome-wide association study in a German population of IBD patients [97]. However, the mechanisms by which the NELL1 gene product interacts with the intestinal microbiota remain unknown.
The findings of this study regarding the association between genetic polymorphisms and intestinal microbiota composition may help in developing personalized therapy for CD patients. Probiotics are considered a promising treatment of various autoimmune diseasestype 1 diabetes [98], multiple sclerosis [99], autoimmune hepatitis [100], rheumatoid arthritis [101], etc. Such therapy may include traditional probiotics (based on lactobacilli and bifidobacteria), next generation probiotics (based on Faecalibacterium prausnitzii [102] or Akkermansia muciniphila [103]), or fecal microbiota transplantation [104].
The limitation of the study is the relatively small number of healthy volunteers. However, differences in the microbiota of CD patients and healthy controls have been described in many previous works, while the variability of the microbiota within a group of CD patients is much less discussed. For this reason, we decided to study a larger number of CD patients for more reliable results. Taking these limitations into account, further investigations on associations of microbiota and genetic markers in both CD patients and healthy controls are required.  Table S2.

Ethics Statement
Informed consent was obtained from all subjects involved in the study. The study was conducted in accordance with the recommendations of the local ethics committee of the Kazan Federal University, Kazan, Russia (Protocol No. 6, dated 13 October 2017) and Interuniversity ethics committee, Moscow, Russia (Protocol No.8, dated 23 September 2021).

16S rRNA Gene-Based Metagenomic Analysis of Stool Samples
Genomic DNA was extracted from fecal samples using the QIAamp DNA Stool Mini Kit (Qiagen, Germantown, MD, USA) in accordance with the manufacturer's instructions. A 16S rRNA sequencing library was constructed according to the 16S metagenomics sequencing library preparation protocol (Illumina, San Diego, CA, USA) targeting the V3 and V4 hypervariable regions of the 16S rRNA gene. The initial PCR was performed with template DNA using region-specific primers shown to have compatibility with the Illumina index and sequencing adapters (forward primer: 5 -TCGTCGGCAGCGTCAGATGTGTATAAGAG ACAGTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG-3 ; reverse primer: 5 -GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGTCTCGTGGGCT CGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC-3 ). After purification of PCR products with AMPure XP magnetic beads, the second PCR was performed using primers from a Nextera XT Index Kit (Illumina). Subsequently, purified PCR products were visualized using gel electrophoresis and quantified with a Qubit dsDNA HS Assay Kit (Thermo Scientific, Waltham, MA, USA) on a Qubit 2.0 fluorometer. The sample pool (4 nM) was denatured with 0.2 N NaOH, diluted further to 4 pM, and combined with 20% (v/v) denatured 4 pM PhiX, prepared following Illumina guidelines. Sequencing of 16S rRNA gene V3-V4 variable regions was performed on the Illumina MiSeq platform in 2 × 300 bp mode at the Interdisciplinary Center of Shared Use of Kazan Federal University.
Reads were further processed and analyzed using the QIIME software, version 1.9.1 [105] according to protocols. Before filtering, there were 53,175-182,362 (median 91,061) read pairs per sample. Paired-end reads were initially merged and then processed to remove low quality and chimeric sequence data. The rarefaction step was performed to reduce sequencing depth heterogeneity between samples. After quality filtering, chimera filtering and rarefying, we analyzed on average 20,938 joined read pairs. Sequences were clustered into operational taxonomic units (OTU) based on the 97% identity threshold (open referencebased OTU picking strategy); the SILVA database v.138 [106] was used. To characterize the richness and evenness of the bacterial community, the alpha diversity index was calculated using Shannon's metrics.

Genotyping
A total of 24 SNPs were selected based on data indicating their potential association with risk for IBD (Table S3). Genomic DNA from venous blood was isolated and purified using the QIAamp DNA Mini Kit (Qiagen, Germantown, MD, USA) as described by the manufacturer. PCR amplification was performed using the primers listed in Table S3 according to the protocol [107]. Genotyping was performed using MALDI-TOF mass spectrometry as described previously [107].

Statistical Analysis
The distribution of genotypes for all SNPs was tested for compliance with the Hardy-Weinberg equilibrium using the chi-square test. Analysis of the allele frequencies was done using Fisher's exact test. The strength of associations was assessed using the odds ratio (OR, (lower 95% confidence interval; upper 95% confidence interval)). Differences in the taxonomic composition of the gut microbiota were assessed using the Kruskal-Wallis test. Correlations between genotypes and gut microbiota composition were analyzed using the R "psych" package [108] based on the Spearman's rank correlation coefficient using an additive genetic model (depending on the genotype, a higher risk of developing CD corresponds to a higher rank). p < 0.05 values were considered as significant. To determine the types of bacterial communities, the Dirichlet multinomial mixture algorithm was applied to cluster the gut microbiota samples [109].
Supplementary Materials: The supporting information can be downloaded at: https://www.mdpi. com/article/10.3390/ijms24097998/s1. References  are cited in the supplementary materials. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. Raw reads are deposited in the NCBI SRA under accession number PRJNA938107 in the fastq format (https://www.ncbi.nlm.nih.gov/sra/PRJNA938107, accessed on 30 March 2023).

Conflicts of Interest:
The authors declare no conflict of interest.